{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Day 2 AM: Analyzing data with `dplyr`" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "suppressPackageStartupMessages(library(tidyverse))\n", "suppressPackageStartupMessages(library(stringr))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chaining data transformations with pipe (`%>%`)\n", "\n", "We will operate on data incrementally, step by step. At each step, we take a `data.frame`, apply a function to it, and generate a different `data.frame`. This `data.frame` itself can be modified by another function, leading to a chain of operations that all take a `data.frame` as input and return a `data.frame` as output. A convenient idiom (borrowed from the Unix shell) is to connect adjacent functions in the chain by a **pipe** which takes the output of a function and feeds it as input to the next function. The **pipe** operator in R is denoted by `%>%`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A simple piping example\n", "\n", "Here we use piping to show rows 6-10 of the iris `data.frame`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n", "\t 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n", "\t 5.0 & 3.6 & 1.4 & 0.2 & setosa\\\\\n", "\t 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n", "\t 4.6 & 3.4 & 1.4 & 0.3 & setosa\\\\\n", "\t 5.0 & 3.4 & 1.5 & 0.2 & setosa\\\\\n", "\t 4.4 & 2.9 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.9 & 3.1 & 1.5 & 0.1 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|---|---|---|---|---|---|---|\n", "| 5.1 | 3.5 | 1.4 | 0.2 | setosa | \n", "| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n", "| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n", "| 4.6 | 3.1 | 1.5 | 0.2 | setosa | \n", "| 5.0 | 3.6 | 1.4 | 0.2 | setosa | \n", "| 5.4 | 3.9 | 1.7 | 0.4 | setosa | \n", "| 4.6 | 3.4 | 1.4 | 0.3 | setosa | \n", "| 5.0 | 3.4 | 1.5 | 0.2 | setosa | \n", "| 4.4 | 2.9 | 1.4 | 0.2 | setosa | \n", "| 4.9 | 3.1 | 1.5 | 0.1 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "1 5.1 3.5 1.4 0.2 setosa \n", "2 4.9 3.0 1.4 0.2 setosa \n", "3 4.7 3.2 1.3 0.2 setosa \n", "4 4.6 3.1 1.5 0.2 setosa \n", "5 5.0 3.6 1.4 0.2 setosa \n", "6 5.4 3.9 1.7 0.4 setosa \n", "7 4.6 3.4 1.4 0.3 setosa \n", "8 5.0 3.4 1.5 0.2 setosa \n", "9 4.4 2.9 1.4 0.2 setosa \n", "10 4.9 3.1 1.5 0.1 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(iris, n=10)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
65.4 3.9 1.7 0.4 setosa
74.6 3.4 1.4 0.3 setosa
85.0 3.4 1.5 0.2 setosa
94.4 2.9 1.4 0.2 setosa
104.9 3.1 1.5 0.1 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " & Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t6 & 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n", "\t7 & 4.6 & 3.4 & 1.4 & 0.3 & setosa\\\\\n", "\t8 & 5.0 & 3.4 & 1.5 & 0.2 & setosa\\\\\n", "\t9 & 4.4 & 2.9 & 1.4 & 0.2 & setosa\\\\\n", "\t10 & 4.9 & 3.1 & 1.5 & 0.1 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|---|---|\n", "| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | \n", "| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa | \n", "| 8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa | \n", "| 9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa | \n", "| 10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "6 5.4 3.9 1.7 0.4 setosa \n", "7 4.6 3.4 1.4 0.3 setosa \n", "8 5.0 3.4 1.5 0.2 setosa \n", "9 4.4 2.9 1.4 0.2 setosa \n", "10 4.9 3.1 1.5 0.1 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% head(n=10) %>% tail(n=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering rows with `filter`" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 7.0 & 3.2 & 4.7 & 1.4 & versicolor\\\\\n", "\t 6.4 & 3.2 & 4.5 & 1.5 & versicolor\\\\\n", "\t 6.9 & 3.1 & 4.9 & 1.5 & versicolor\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|\n", "| 7.0 | 3.2 | 4.7 | 1.4 | versicolor | \n", "| 6.4 | 3.2 | 4.5 | 1.5 | versicolor | \n", "| 6.9 | 3.1 | 4.9 | 1.5 | versicolor | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n", "1 7.0 3.2 4.7 1.4 versicolor\n", "2 6.4 3.2 4.5 1.5 versicolor\n", "3 6.9 3.1 4.9 1.5 versicolor" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% filter(Species == \"versicolor\") %>% head(3)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 7.0 & 3.2 & 4.7 & 1.4 & versicolor\\\\\n", "\t 6.4 & 3.2 & 4.5 & 1.5 & versicolor\\\\\n", "\t 6.9 & 3.1 & 4.9 & 1.5 & versicolor\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|\n", "| 7.0 | 3.2 | 4.7 | 1.4 | versicolor | \n", "| 6.4 | 3.2 | 4.5 | 1.5 | versicolor | \n", "| 6.9 | 3.1 | 4.9 | 1.5 | versicolor | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n", "1 7.0 3.2 4.7 1.4 versicolor\n", "2 6.4 3.2 4.5 1.5 versicolor\n", "3 6.9 3.1 4.9 1.5 versicolor" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% filter(Sepal.Length > 6) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
6.3 3.3 6.0 2.5 virginica
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 6.3 & 3.3 & 6.0 & 2.5 & virginica\\\\\n", "\t 7.1 & 3.0 & 5.9 & 2.1 & virginica\\\\\n", "\t 6.3 & 2.9 & 5.6 & 1.8 & virginica\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|\n", "| 6.3 | 3.3 | 6.0 | 2.5 | virginica | \n", "| 7.1 | 3.0 | 5.9 | 2.1 | virginica | \n", "| 6.3 | 2.9 | 5.6 | 1.8 | virginica | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n", "1 6.3 3.3 6.0 2.5 virginica\n", "2 7.1 3.0 5.9 2.1 virginica\n", "3 6.3 2.9 5.6 1.8 virginica" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% filter((Sepal.Length > 6) & (Species == \"virginica\")) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 7.0 & 3.2 & 4.7 & 1.4 & versicolor\\\\\n", "\t 6.4 & 3.2 & 4.5 & 1.5 & versicolor\\\\\n", "\t 6.9 & 3.1 & 4.9 & 1.5 & versicolor\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|\n", "| 7.0 | 3.2 | 4.7 | 1.4 | versicolor | \n", "| 6.4 | 3.2 | 4.5 | 1.5 | versicolor | \n", "| 6.9 | 3.1 | 4.9 | 1.5 | versicolor | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n", "1 7.0 3.2 4.7 1.4 versicolor\n", "2 6.4 3.2 4.5 1.5 versicolor\n", "3 6.9 3.1 4.9 1.5 versicolor" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% filter(Sepal.Length > mean(Sepal.Length)) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
7.1 3.0 5.9 2.1 virginica
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 6.3 & 3.3 & 6.0 & 2.5 & virginica\\\\\n", "\t 5.8 & 2.7 & 5.1 & 1.9 & virginica\\\\\n", "\t 7.1 & 3.0 & 5.9 & 2.1 & virginica\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|\n", "| 6.3 | 3.3 | 6.0 | 2.5 | virginica | \n", "| 5.8 | 2.7 | 5.1 | 1.9 | virginica | \n", "| 7.1 | 3.0 | 5.9 | 2.1 | virginica | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n", "1 6.3 3.3 6.0 2.5 virginica\n", "2 5.8 2.7 5.1 1.9 virginica\n", "3 7.1 3.0 5.9 2.1 virginica" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% filter(str_detect(Species, \"virgin\")) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Slicing rows by index\n", "\n", "We can do this via indexing, but using `slice` can be helpful for chaining of fluent commands." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n", "\t 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n", "\t 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n", "\t 4.6 & 3.4 & 1.4 & 0.3 & setosa\\\\\n", "\t 5.0 & 3.4 & 1.5 & 0.2 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|---|---|---|\n", "| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n", "| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n", "| 4.6 | 3.1 | 1.5 | 0.2 | setosa | \n", "| 5.4 | 3.9 | 1.7 | 0.4 | setosa | \n", "| 4.6 | 3.4 | 1.4 | 0.3 | setosa | \n", "| 5.0 | 3.4 | 1.5 | 0.2 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "1 4.9 3.0 1.4 0.2 setosa \n", "2 4.7 3.2 1.3 0.2 setosa \n", "3 4.6 3.1 1.5 0.2 setosa \n", "4 5.4 3.9 1.7 0.4 setosa \n", "5 4.6 3.4 1.4 0.3 setosa \n", "6 5.0 3.4 1.5 0.2 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% slice(c(2:4, 6:8))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting columns with `select`" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Petal.LengthPetal.WidthSepal.LengthSepal.Width
1.40.25.13.5
1.40.24.93.0
1.30.24.73.2
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " Petal.Length & Petal.Width & Sepal.Length & Sepal.Width\\\\\n", "\\hline\n", "\t 1.4 & 0.2 & 5.1 & 3.5\\\\\n", "\t 1.4 & 0.2 & 4.9 & 3.0\\\\\n", "\t 1.3 & 0.2 & 4.7 & 3.2\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Petal.Length | Petal.Width | Sepal.Length | Sepal.Width | \n", "|---|---|---|\n", "| 1.4 | 0.2 | 5.1 | 3.5 | \n", "| 1.4 | 0.2 | 4.9 | 3.0 | \n", "| 1.3 | 0.2 | 4.7 | 3.2 | \n", "\n", "\n" ], "text/plain": [ " Petal.Length Petal.Width Sepal.Length Sepal.Width\n", "1 1.4 0.2 5.1 3.5 \n", "2 1.4 0.2 4.9 3.0 \n", "3 1.3 0.2 4.7 3.2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% select(c(Petal.Length, Petal.Width, Sepal.Length, Sepal.Width)) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Petal.LengthPetal.WidthSepal.LengthSepal.Width
1.40.25.13.5
1.40.24.93.0
1.30.24.73.2
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " Petal.Length & Petal.Width & Sepal.Length & Sepal.Width\\\\\n", "\\hline\n", "\t 1.4 & 0.2 & 5.1 & 3.5\\\\\n", "\t 1.4 & 0.2 & 4.9 & 3.0\\\\\n", "\t 1.3 & 0.2 & 4.7 & 3.2\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Petal.Length | Petal.Width | Sepal.Length | Sepal.Width | \n", "|---|---|---|\n", "| 1.4 | 0.2 | 5.1 | 3.5 | \n", "| 1.4 | 0.2 | 4.9 | 3.0 | \n", "| 1.3 | 0.2 | 4.7 | 3.2 | \n", "\n", "\n" ], "text/plain": [ " Petal.Length Petal.Width Sepal.Length Sepal.Width\n", "1 1.4 0.2 5.1 3.5 \n", "2 1.4 0.2 4.9 3.0 \n", "3 1.3 0.2 4.7 3.2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% select(c(3,4,1,2)) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.Width
5.13.51.40.2
4.93.01.40.2
4.73.21.30.2
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width\\\\\n", "\\hline\n", "\t 5.1 & 3.5 & 1.4 & 0.2\\\\\n", "\t 4.9 & 3.0 & 1.4 & 0.2\\\\\n", "\t 4.7 & 3.2 & 1.3 & 0.2\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | \n", "|---|---|---|\n", "| 5.1 | 3.5 | 1.4 | 0.2 | \n", "| 4.9 | 3.0 | 1.4 | 0.2 | \n", "| 4.7 | 3.2 | 1.3 | 0.2 | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width\n", "1 5.1 3.5 1.4 0.2 \n", "2 4.9 3.0 1.4 0.2 \n", "3 4.7 3.2 1.3 0.2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% select(-Species) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthPetal.Length
5.11.4
4.91.4
4.71.3
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Sepal.Length & Petal.Length\\\\\n", "\\hline\n", "\t 5.1 & 1.4\\\\\n", "\t 4.9 & 1.4\\\\\n", "\t 4.7 & 1.3\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Petal.Length | \n", "|---|---|---|\n", "| 5.1 | 1.4 | \n", "| 4.9 | 1.4 | \n", "| 4.7 | 1.3 | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Petal.Length\n", "1 5.1 1.4 \n", "2 4.9 1.4 \n", "3 4.7 1.3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% select(contains(\"Length\")) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthSpecies
5.1 3.5 setosa
4.9 3.0 setosa
4.7 3.2 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " Sepal.Length & Sepal.Width & Species\\\\\n", "\\hline\n", "\t 5.1 & 3.5 & setosa\\\\\n", "\t 4.9 & 3.0 & setosa\\\\\n", "\t 4.7 & 3.2 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Species | \n", "|---|---|---|\n", "| 5.1 | 3.5 | setosa | \n", "| 4.9 | 3.0 | setosa | \n", "| 4.7 | 3.2 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Species\n", "1 5.1 3.5 setosa \n", "2 4.9 3.0 setosa \n", "3 4.7 3.2 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% select(starts_with(\"S\")) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Renaming columns" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthType
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Type\\\\\n", "\\hline\n", "\t 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Type | \n", "|---|---|---|\n", "| 5.1 | 3.5 | 1.4 | 0.2 | setosa | \n", "| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n", "| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Type \n", "1 5.1 3.5 1.4 0.2 setosa\n", "2 4.9 3.0 1.4 0.2 setosa\n", "3 4.7 3.2 1.3 0.2 setosa" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% rename(Type = Species) %>% head(3)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
SLSWPLPWSpecies
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " SL & SW & PL & PW & Species\\\\\n", "\\hline\n", "\t 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "SL | SW | PL | PW | Species | \n", "|---|---|---|\n", "| 5.1 | 3.5 | 1.4 | 0.2 | setosa | \n", "| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n", "| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n", "\n", "\n" ], "text/plain": [ " SL SW PL PW Species\n", "1 5.1 3.5 1.4 0.2 setosa \n", "2 4.9 3.0 1.4 0.2 setosa \n", "3 4.7 3.2 1.3 0.2 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% rename(SL=Sepal.Length, SW=Sepal.Width, PW=Petal.Width, PL=Petal.Length) %>% head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sorting data with `arrange`" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
4.3 3.0 1.1 0.1 setosa
4.4 2.9 1.4 0.2 setosa
4.4 3.0 1.3 0.2 setosa
4.4 3.2 1.3 0.2 setosa
4.5 2.3 1.3 0.3 setosa
4.6 3.1 1.5 0.2 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 4.3 & 3.0 & 1.1 & 0.1 & setosa\\\\\n", "\t 4.4 & 2.9 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.4 & 3.0 & 1.3 & 0.2 & setosa\\\\\n", "\t 4.4 & 3.2 & 1.3 & 0.2 & setosa\\\\\n", "\t 4.5 & 2.3 & 1.3 & 0.3 & setosa\\\\\n", "\t 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|---|---|---|\n", "| 4.3 | 3.0 | 1.1 | 0.1 | setosa | \n", "| 4.4 | 2.9 | 1.4 | 0.2 | setosa | \n", "| 4.4 | 3.0 | 1.3 | 0.2 | setosa | \n", "| 4.4 | 3.2 | 1.3 | 0.2 | setosa | \n", "| 4.5 | 2.3 | 1.3 | 0.3 | setosa | \n", "| 4.6 | 3.1 | 1.5 | 0.2 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "1 4.3 3.0 1.1 0.1 setosa \n", "2 4.4 2.9 1.4 0.2 setosa \n", "3 4.4 3.0 1.3 0.2 setosa \n", "4 4.4 3.2 1.3 0.2 setosa \n", "5 4.5 2.3 1.3 0.3 setosa \n", "6 4.6 3.1 1.5 0.2 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% arrange(Sepal.Length) %>% head" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
4.3 3.0 1.1 0.1 setosa
4.4 3.2 1.3 0.2 setosa
4.4 3.0 1.3 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.5 2.3 1.3 0.3 setosa
4.6 3.6 1.0 0.2 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 4.3 & 3.0 & 1.1 & 0.1 & setosa\\\\\n", "\t 4.4 & 3.2 & 1.3 & 0.2 & setosa\\\\\n", "\t 4.4 & 3.0 & 1.3 & 0.2 & setosa\\\\\n", "\t 4.4 & 2.9 & 1.4 & 0.2 & setosa\\\\\n", "\t 4.5 & 2.3 & 1.3 & 0.3 & setosa\\\\\n", "\t 4.6 & 3.6 & 1.0 & 0.2 & setosa\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|---|---|---|\n", "| 4.3 | 3.0 | 1.1 | 0.1 | setosa | \n", "| 4.4 | 3.2 | 1.3 | 0.2 | setosa | \n", "| 4.4 | 3.0 | 1.3 | 0.2 | setosa | \n", "| 4.4 | 2.9 | 1.4 | 0.2 | setosa | \n", "| 4.5 | 2.3 | 1.3 | 0.3 | setosa | \n", "| 4.6 | 3.6 | 1.0 | 0.2 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "1 4.3 3.0 1.1 0.1 setosa \n", "2 4.4 3.2 1.3 0.2 setosa \n", "3 4.4 3.0 1.3 0.2 setosa \n", "4 4.4 2.9 1.4 0.2 setosa \n", "5 4.5 2.3 1.3 0.3 setosa \n", "6 4.6 3.6 1.0 0.2 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% arrange(Sepal.Length, desc(Sepal.Width)) %>% head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating new columns with `mutate` and `transmute`" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpeciesComb.LengthComb.Width
5.1 3.5 1.4 0.2 setosa6.5 3.7
4.9 3.0 1.4 0.2 setosa6.3 3.2
4.7 3.2 1.3 0.2 setosa6.0 3.4
4.6 3.1 1.5 0.2 setosa6.1 3.3
5.0 3.6 1.4 0.2 setosa6.4 3.8
5.4 3.9 1.7 0.4 setosa7.1 4.3
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species & Comb.Length & Comb.Width\\\\\n", "\\hline\n", "\t 5.1 & 3.5 & 1.4 & 0.2 & setosa & 6.5 & 3.7 \\\\\n", "\t 4.9 & 3.0 & 1.4 & 0.2 & setosa & 6.3 & 3.2 \\\\\n", "\t 4.7 & 3.2 & 1.3 & 0.2 & setosa & 6.0 & 3.4 \\\\\n", "\t 4.6 & 3.1 & 1.5 & 0.2 & setosa & 6.1 & 3.3 \\\\\n", "\t 5.0 & 3.6 & 1.4 & 0.2 & setosa & 6.4 & 3.8 \\\\\n", "\t 5.4 & 3.9 & 1.7 & 0.4 & setosa & 7.1 & 4.3 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | Comb.Length | Comb.Width | \n", "|---|---|---|---|---|---|\n", "| 5.1 | 3.5 | 1.4 | 0.2 | setosa | 6.5 | 3.7 | \n", "| 4.9 | 3.0 | 1.4 | 0.2 | setosa | 6.3 | 3.2 | \n", "| 4.7 | 3.2 | 1.3 | 0.2 | setosa | 6.0 | 3.4 | \n", "| 4.6 | 3.1 | 1.5 | 0.2 | setosa | 6.1 | 3.3 | \n", "| 5.0 | 3.6 | 1.4 | 0.2 | setosa | 6.4 | 3.8 | \n", "| 5.4 | 3.9 | 1.7 | 0.4 | setosa | 7.1 | 4.3 | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species Comb.Length\n", "1 5.1 3.5 1.4 0.2 setosa 6.5 \n", "2 4.9 3.0 1.4 0.2 setosa 6.3 \n", "3 4.7 3.2 1.3 0.2 setosa 6.0 \n", "4 4.6 3.1 1.5 0.2 setosa 6.1 \n", "5 5.0 3.6 1.4 0.2 setosa 6.4 \n", "6 5.4 3.9 1.7 0.4 setosa 7.1 \n", " Comb.Width\n", "1 3.7 \n", "2 3.2 \n", "3 3.4 \n", "4 3.3 \n", "5 3.8 \n", "6 4.3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% mutate(Comb.Length=Sepal.Length + Petal.Length, \n", " Comb.Width = Sepal.Width + Petal.Width) %>% head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mutate only columns where condition is TRUE" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
1.629241 1.252763 0.3364722 -1.6094379setosa
1.589235 1.098612 0.3364722 -1.6094379setosa
1.547563 1.163151 0.2623643 -1.6094379setosa
1.526056 1.131402 0.4054651 -1.6094379setosa
1.609438 1.280934 0.3364722 -1.6094379setosa
1.686399 1.360977 0.5306283 -0.9162907setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 1.629241 & 1.252763 & 0.3364722 & -1.6094379 & setosa \\\\\n", "\t 1.589235 & 1.098612 & 0.3364722 & -1.6094379 & setosa \\\\\n", "\t 1.547563 & 1.163151 & 0.2623643 & -1.6094379 & setosa \\\\\n", "\t 1.526056 & 1.131402 & 0.4054651 & -1.6094379 & setosa \\\\\n", "\t 1.609438 & 1.280934 & 0.3364722 & -1.6094379 & setosa \\\\\n", "\t 1.686399 & 1.360977 & 0.5306283 & -0.9162907 & setosa \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|---|---|---|\n", "| 1.629241 | 1.252763 | 0.3364722 | -1.6094379 | setosa | \n", "| 1.589235 | 1.098612 | 0.3364722 | -1.6094379 | setosa | \n", "| 1.547563 | 1.163151 | 0.2623643 | -1.6094379 | setosa | \n", "| 1.526056 | 1.131402 | 0.4054651 | -1.6094379 | setosa | \n", "| 1.609438 | 1.280934 | 0.3364722 | -1.6094379 | setosa | \n", "| 1.686399 | 1.360977 | 0.5306283 | -0.9162907 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "1 1.629241 1.252763 0.3364722 -1.6094379 setosa \n", "2 1.589235 1.098612 0.3364722 -1.6094379 setosa \n", "3 1.547563 1.163151 0.2623643 -1.6094379 setosa \n", "4 1.526056 1.131402 0.4054651 -1.6094379 setosa \n", "5 1.609438 1.280934 0.3364722 -1.6094379 setosa \n", "6 1.686399 1.360977 0.5306283 -0.9162907 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% mutate_if(is.numeric, log) %>% head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mutate columns that meet string criteria" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
1.629241 3.5 0.33647220.2 setosa
1.589235 3.0 0.33647220.2 setosa
1.547563 3.2 0.26236430.2 setosa
1.526056 3.1 0.40546510.2 setosa
1.609438 3.6 0.33647220.2 setosa
1.686399 3.9 0.53062830.4 setosa
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n", "\\hline\n", "\t 1.629241 & 3.5 & 0.3364722 & 0.2 & setosa \\\\\n", "\t 1.589235 & 3.0 & 0.3364722 & 0.2 & setosa \\\\\n", "\t 1.547563 & 3.2 & 0.2623643 & 0.2 & setosa \\\\\n", "\t 1.526056 & 3.1 & 0.4054651 & 0.2 & setosa \\\\\n", "\t 1.609438 & 3.6 & 0.3364722 & 0.2 & setosa \\\\\n", "\t 1.686399 & 3.9 & 0.5306283 & 0.4 & setosa \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n", "|---|---|---|---|---|---|\n", "| 1.629241 | 3.5 | 0.3364722 | 0.2 | setosa | \n", "| 1.589235 | 3.0 | 0.3364722 | 0.2 | setosa | \n", "| 1.547563 | 3.2 | 0.2623643 | 0.2 | setosa | \n", "| 1.526056 | 3.1 | 0.4054651 | 0.2 | setosa | \n", "| 1.609438 | 3.6 | 0.3364722 | 0.2 | setosa | \n", "| 1.686399 | 3.9 | 0.5306283 | 0.4 | setosa | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "1 1.629241 3.5 0.3364722 0.2 setosa \n", "2 1.589235 3.0 0.3364722 0.2 setosa \n", "3 1.547563 3.2 0.2623643 0.2 setosa \n", "4 1.526056 3.1 0.4054651 0.2 setosa \n", "5 1.609438 3.6 0.3364722 0.2 setosa \n", "6 1.686399 3.9 0.5306283 0.4 setosa " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% mutate_at(c(\"Sepal.Length\", \"Petal.Length\"), log) %>% head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Only keep mutated columns" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Comb.LengthComb.Width
6.53.7
6.33.2
6.03.4
6.13.3
6.43.8
7.14.3
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Comb.Length & Comb.Width\\\\\n", "\\hline\n", "\t 6.5 & 3.7\\\\\n", "\t 6.3 & 3.2\\\\\n", "\t 6.0 & 3.4\\\\\n", "\t 6.1 & 3.3\\\\\n", "\t 6.4 & 3.8\\\\\n", "\t 7.1 & 4.3\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Comb.Length | Comb.Width | \n", "|---|---|---|---|---|---|\n", "| 6.5 | 3.7 | \n", "| 6.3 | 3.2 | \n", "| 6.0 | 3.4 | \n", "| 6.1 | 3.3 | \n", "| 6.4 | 3.8 | \n", "| 7.1 | 4.3 | \n", "\n", "\n" ], "text/plain": [ " Comb.Length Comb.Width\n", "1 6.5 3.7 \n", "2 6.3 3.2 \n", "3 6.0 3.4 \n", "4 6.1 3.3 \n", "5 6.4 3.8 \n", "6 7.1 4.3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% transmute(Comb.Length=Sepal.Length + Petal.Length,\n", " Comb.Width = Sepal.Width + Petal.Width) %>% head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####Multiple transformations" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Sepal.Length_logSepal.Width_logPetal.Length_logPetal.Width_logSepal.Length_sqrtSepal.Width_sqrtPetal.Length_sqrtPetal.Width_sqrt
1.629241 1.252763 0.3364722 -1.60943792.258318 1.870829 1.183216 0.4472136
1.589235 1.098612 0.3364722 -1.60943792.213594 1.732051 1.183216 0.4472136
1.547563 1.163151 0.2623643 -1.60943792.167948 1.788854 1.140175 0.4472136
1.526056 1.131402 0.4054651 -1.60943792.144761 1.760682 1.224745 0.4472136
1.609438 1.280934 0.3364722 -1.60943792.236068 1.897367 1.183216 0.4472136
1.686399 1.360977 0.5306283 -0.91629072.323790 1.974842 1.303840 0.6324555
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllll}\n", " Sepal.Length\\_log & Sepal.Width\\_log & Petal.Length\\_log & Petal.Width\\_log & Sepal.Length\\_sqrt & Sepal.Width\\_sqrt & Petal.Length\\_sqrt & Petal.Width\\_sqrt\\\\\n", "\\hline\n", "\t 1.629241 & 1.252763 & 0.3364722 & -1.6094379 & 2.258318 & 1.870829 & 1.183216 & 0.4472136 \\\\\n", "\t 1.589235 & 1.098612 & 0.3364722 & -1.6094379 & 2.213594 & 1.732051 & 1.183216 & 0.4472136 \\\\\n", "\t 1.547563 & 1.163151 & 0.2623643 & -1.6094379 & 2.167948 & 1.788854 & 1.140175 & 0.4472136 \\\\\n", "\t 1.526056 & 1.131402 & 0.4054651 & -1.6094379 & 2.144761 & 1.760682 & 1.224745 & 0.4472136 \\\\\n", "\t 1.609438 & 1.280934 & 0.3364722 & -1.6094379 & 2.236068 & 1.897367 & 1.183216 & 0.4472136 \\\\\n", "\t 1.686399 & 1.360977 & 0.5306283 & -0.9162907 & 2.323790 & 1.974842 & 1.303840 & 0.6324555 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length_log | Sepal.Width_log | Petal.Length_log | Petal.Width_log | Sepal.Length_sqrt | Sepal.Width_sqrt | Petal.Length_sqrt | Petal.Width_sqrt | \n", "|---|---|---|---|---|---|\n", "| 1.629241 | 1.252763 | 0.3364722 | -1.6094379 | 2.258318 | 1.870829 | 1.183216 | 0.4472136 | \n", "| 1.589235 | 1.098612 | 0.3364722 | -1.6094379 | 2.213594 | 1.732051 | 1.183216 | 0.4472136 | \n", "| 1.547563 | 1.163151 | 0.2623643 | -1.6094379 | 2.167948 | 1.788854 | 1.140175 | 0.4472136 | \n", "| 1.526056 | 1.131402 | 0.4054651 | -1.6094379 | 2.144761 | 1.760682 | 1.224745 | 0.4472136 | \n", "| 1.609438 | 1.280934 | 0.3364722 | -1.6094379 | 2.236068 | 1.897367 | 1.183216 | 0.4472136 | \n", "| 1.686399 | 1.360977 | 0.5306283 | -0.9162907 | 2.323790 | 1.974842 | 1.303840 | 0.6324555 | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length_log Sepal.Width_log Petal.Length_log Petal.Width_log\n", "1 1.629241 1.252763 0.3364722 -1.6094379 \n", "2 1.589235 1.098612 0.3364722 -1.6094379 \n", "3 1.547563 1.163151 0.2623643 -1.6094379 \n", "4 1.526056 1.131402 0.4054651 -1.6094379 \n", "5 1.609438 1.280934 0.3364722 -1.6094379 \n", "6 1.686399 1.360977 0.5306283 -0.9162907 \n", " Sepal.Length_sqrt Sepal.Width_sqrt Petal.Length_sqrt Petal.Width_sqrt\n", "1 2.258318 1.870829 1.183216 0.4472136 \n", "2 2.213594 1.732051 1.183216 0.4472136 \n", "3 2.167948 1.788854 1.140175 0.4472136 \n", "4 2.144761 1.760682 1.224745 0.4472136 \n", "5 2.236068 1.897367 1.183216 0.4472136 \n", "6 2.323790 1.974842 1.303840 0.6324555 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% transmute_if(is.numeric, funs(log, sqrt)) %>% head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Split-apply-combine with `group_by` and `summarize`" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
mean
5.843333
\n" ], "text/latex": [ "\\begin{tabular}{r|l}\n", " mean\\\\\n", "\\hline\n", "\t 5.843333\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "mean | \n", "|---|\n", "| 5.843333 | \n", "\n", "\n" ], "text/plain": [ " mean \n", "1 5.843333" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% summarise(mean = mean(Sepal.Length)) %>% head" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.Width
876.5458.6563.7179.9
\n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " Sepal.Length & Sepal.Width & Petal.Length & Petal.Width\\\\\n", "\\hline\n", "\t 876.5 & 458.6 & 563.7 & 179.9\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | \n", "|---|\n", "| 876.5 | 458.6 | 563.7 | 179.9 | \n", "\n", "\n" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width\n", "1 876.5 458.6 563.7 179.9 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% summarise_if(is.numeric, sum) %>% head" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Split-apply-combine" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Speciescount
setosa 50
versicolor50
virginica 50
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Species & count\\\\\n", "\\hline\n", "\t setosa & 50 \\\\\n", "\t versicolor & 50 \\\\\n", "\t virginica & 50 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Species | count | \n", "|---|---|---|\n", "| setosa | 50 | \n", "| versicolor | 50 | \n", "| virginica | 50 | \n", "\n", "\n" ], "text/plain": [ " Species count\n", "1 setosa 50 \n", "2 versicolor 50 \n", "3 virginica 50 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% \n", "group_by(Species) %>% \n", "summarise(count = n())" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
SpeciesSW.meanSW.cv
setosa 3.428 9.043319
versicolor2.770 8.827326
virginica 2.974 9.221802
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " Species & SW.mean & SW.cv\\\\\n", "\\hline\n", "\t setosa & 3.428 & 9.043319 \\\\\n", "\t versicolor & 2.770 & 8.827326 \\\\\n", "\t virginica & 2.974 & 9.221802 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Species | SW.mean | SW.cv | \n", "|---|---|---|\n", "| setosa | 3.428 | 9.043319 | \n", "| versicolor | 2.770 | 8.827326 | \n", "| virginica | 2.974 | 9.221802 | \n", "\n", "\n" ], "text/plain": [ " Species SW.mean SW.cv \n", "1 setosa 3.428 9.043319\n", "2 versicolor 2.770 8.827326\n", "3 virginica 2.974 9.221802" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% \n", "group_by(Species) %>% \n", "summarise(SW.mean = mean(Sepal.Width),\n", " SW.cv = mean(Sepal.Width)/sd(Sepal.Width))" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Speciesminmaxmeanmedian
setosa 4.3 5.8 5.006 5.0
versicolor4.9 7.0 5.936 5.9
virginica 4.9 7.9 6.588 6.5
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " Species & min & max & mean & median\\\\\n", "\\hline\n", "\t setosa & 4.3 & 5.8 & 5.006 & 5.0 \\\\\n", "\t versicolor & 4.9 & 7.0 & 5.936 & 5.9 \\\\\n", "\t virginica & 4.9 & 7.9 & 6.588 & 6.5 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Species | min | max | mean | median | \n", "|---|---|---|\n", "| setosa | 4.3 | 5.8 | 5.006 | 5.0 | \n", "| versicolor | 4.9 | 7.0 | 5.936 | 5.9 | \n", "| virginica | 4.9 | 7.9 | 6.588 | 6.5 | \n", "\n", "\n" ], "text/plain": [ " Species min max mean median\n", "1 setosa 4.3 5.8 5.006 5.0 \n", "2 versicolor 4.9 7.0 5.936 5.9 \n", "3 virginica 4.9 7.9 6.588 6.5 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% \n", "group_by(Species) %>% \n", "summarise_at(\"Sepal.Length\", funs(min, max, mean, median))" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
SpeciesSepal.Length_minSepal.Width_minPetal.Length_minPetal.Width_minSepal.Length_maxSepal.Width_maxPetal.Length_maxPetal.Width_max
setosa 4.3 2.3 1.0 0.1 5.8 4.4 1.9 0.6
versicolor4.9 2.0 3.0 1.0 7.0 3.4 5.1 1.8
virginica 4.9 2.2 4.5 1.4 7.9 3.8 6.9 2.5
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllll}\n", " Species & Sepal.Length\\_min & Sepal.Width\\_min & Petal.Length\\_min & Petal.Width\\_min & Sepal.Length\\_max & Sepal.Width\\_max & Petal.Length\\_max & Petal.Width\\_max\\\\\n", "\\hline\n", "\t setosa & 4.3 & 2.3 & 1.0 & 0.1 & 5.8 & 4.4 & 1.9 & 0.6 \\\\\n", "\t versicolor & 4.9 & 2.0 & 3.0 & 1.0 & 7.0 & 3.4 & 5.1 & 1.8 \\\\\n", "\t virginica & 4.9 & 2.2 & 4.5 & 1.4 & 7.9 & 3.8 & 6.9 & 2.5 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Species | Sepal.Length_min | Sepal.Width_min | Petal.Length_min | Petal.Width_min | Sepal.Length_max | Sepal.Width_max | Petal.Length_max | Petal.Width_max | \n", "|---|---|---|\n", "| setosa | 4.3 | 2.3 | 1.0 | 0.1 | 5.8 | 4.4 | 1.9 | 0.6 | \n", "| versicolor | 4.9 | 2.0 | 3.0 | 1.0 | 7.0 | 3.4 | 5.1 | 1.8 | \n", "| virginica | 4.9 | 2.2 | 4.5 | 1.4 | 7.9 | 3.8 | 6.9 | 2.5 | \n", "\n", "\n" ], "text/plain": [ " Species Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min\n", "1 setosa 4.3 2.3 1.0 0.1 \n", "2 versicolor 4.9 2.0 3.0 1.0 \n", "3 virginica 4.9 2.2 4.5 1.4 \n", " Sepal.Length_max Sepal.Width_max Petal.Length_max Petal.Width_max\n", "1 5.8 4.4 1.9 0.6 \n", "2 7.0 3.4 5.1 1.8 \n", "3 7.9 3.8 6.9 2.5 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "iris %>% \n", "group_by(Species) %>% \n", "summarise_all(funs(min, max))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.4.0" } }, "nbformat": 4, "nbformat_minor": 2 }